Search CORE

225 research outputs found

A Bayesian - Deep Learning model for estimating Covid-19 evolution in Spain

Author: Cabras Stefano
Publication venue
Publication date: 06/03/2021
Field of study

This work proposes a semi-parametric approach to estimate Covid-19 (SARS-CoV-2) evolution in Spain. Considering the sequences of 14 days cumulative incidence of all Spanish regions, it combines modern Deep Learning (DL) techniques for analyzing sequences with the usual Bayesian Poisson-Gamma model for counts. DL model provides a suitable description of observed sequences but no reliable uncertainty quantification around it can be obtained. To overcome this we use the prediction from DL as an expert elicitation of the expected number of counts along with their uncertainty and thus obtaining the posterior predictive distribution of counts in an orthodox Bayesian analysis using the well known Poisson-Gamma model. The overall resulting model allows us to either predict the future evolution of the sequences on all regions, as well as, estimating the consequences of eventual scenarios.Comment: Related to: https://github.com/scabras/covid19-bayes-d

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Universidad Carlos III de Madrid e-Archivo

A Markov chain representation of the multiple testing problem

Author: CABRAS STEFANO
Publication venue: 'SAGE Publications'
Publication date: 01/01/2018
Field of study

The problem of multiple hypothesis testing can be represented as a Markov process where a new alternative hypothesis is accepted in accordance with its relative evidence to the currently accepted one. This virtual and not formally observed process provides the most probable set of non null hypotheses given the data; it plays the same role as Markov Chain Monte Carlo in approximating a posterior distribution. To apply this representation and obtain the posterior probabilities over all alternative hypotheses, it is enough to have, for each test, barely defined Bayes Factors, e.g. Bayes Factors obtained up to an unknown constant. Such Bayes Factors may either arise from using default and improper priors or from calibrating p-values with respect to their corresponding Bayes Factor lower bound. Both sources of evidence are used to form a Markov transition kernel on the space of hypotheses. The approach leads to easy interpretable results and involves very simple formulas suitable to analyze large datasets as those arising from gene expression data (microarray or RNA-seq experiments)

Archivio istituzionale della ricerca - Università di Cagliari

Universidad Carlos III de Madrid e-Archivo

A Dirichlet Process Prior Approach for Covariate Selection

Author: Cabras Stefano
Publication venue: 'MDPI AG'
Publication date: 28/08/2020
Field of study

This article belongs to the Special Issue Bayesian Inference and ComputationThe variable selection problem in general, and specifically for the ordinary linear regression model, is considered in the setup in which the number of covariates is large enough to prevent the exploration of all possible models. In this context, Gibbs-sampling is needed to perform stochastic model exploration to estimate, for instance, the model inclusion probability. We show that under a Bayesian non-parametric prior model for analyzing Gibbs-sampling output, the usual empirical estimator is just the asymptotic version of the expected posterior inclusion probability given the simulation output from Gibbs-sampling. Other posterior conditional estimators of inclusion probabilities can also be considered as related to the latent probabilities distributions on the model space which can be sampled given the observed Gibbs-sampling output. This paper will also compare, in this large model space setup the conventional prior approach against the non-local prior approach used to define the Bayes Factors for model selection. The approach is exposed along with simulation samples and also an application of modeling the Travel and Tourism factors all over the world.This research is supported by Ministerio de Ciencia e Innovación of Spain project PID2019-104790GB-I00

Multidisciplinary Digital Publishing Institute

Universidad Carlos III de Madrid e-Archivo

American LNG and the EU-Russia Relationship: The End of Moscow’s Energy Weapon? College of Europe EU Diplomacy Paper 2/2021.

Author: Cabras Stefano
Publication venue
Publication date: 01/01/2021
Field of study

This paper examines to what extent the shale revolution in the United States (US) and the new US position in the global energy market has impacted the European Union’s (EU) gas market and energy relationship with Russia. Making use of an analytical framework to study energy interdependence, the paper notes that the EU has long promoted a liberal view of energy trade, founded on economic cooperation and market rules. On the contrary, Russia and the US have tended to adopt a realist perspective, whereby energy is viewed as a strategic asset that can be used to achieve geopolitical gains. Moscow, in particular, has been accused of using gas as a weapon to achieve geopolitical gains. The study finds that US liquefied natural gas (LNG) coming to the market as of 2019 has generated an oversupply and strengthened the position of EU buyers vis-à-vis the Russian Gazprom. Moreover, the innovative features introduced by US LNG have to some extent de-politicized the gas business and made the EU’s market-oriented, liberal approach more effective. The paper concludes that due to today’s abundance of gas supply options and to the increasing competitiveness of renewable alternatives to natural gas, the success of US LNG in Europe will depend both on its price competitiveness and on whether the Biden Administration will succeed in reducing the greenhouse gas emissions associated with US LNG and make it compatible with the objectives of the European Green Deal

Archive of European Integration

A bayesian-deep learning model for estimating covid-19 evolution in Spain

Author: Cabras Stefano
Publication venue: 'MDPI AG'
Publication date: 01/11/2021
Field of study

This work proposes a semi-parametric approach to estimate the evolution of COVID-19 (SARS-CoV-2) in Spain. Considering the sequences of 14-day cumulative incidence of all Spanish regions, it combines modern Deep Learning (DL) techniques for analyzing sequences with the usual Bayesian Poisson-Gamma model for counts. The DL model provides a suitable description of the observed time series of counts, but it cannot give a reliable uncertainty quantification. The role of expert elicitation of the expected number of counts and its reliability is DL predictions' role in the proposed modelling approach. Finally, the posterior predictive distribution of counts is obtained in a standard Bayesian analysis using the well known Poisson-Gamma model. The model allows to predict the future evolution of the sequences on all regions or estimates the consequences of eventual scenarios.The MINECO-Spain project PID2019-104790GB-I00 funded the author

Directory of Open Access Journals

Universidad Carlos III de Madrid e-Archivo

Approximate Bayesian Computation by Modelling Summary Statistics in a Quasi-likelihood Framework

Author: Cabras Stefano
Nueda Maria Eugenia Castellanos
Ruli Erlis
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2015
Field of study

Approximate Bayesian Computation (ABC) is a useful class of methods for Bayesian inference when the likelihood function is computationally intractable. In practice, the basic ABC algorithm may be inefficient in the presence of discrepancy between prior and posterior. Therefore, more elaborate methods, such as ABC with the Markov chain Monte Carlo algorithm (ABC-MCMC), should be used. However, the elaboration of a proposal density for MCMC is a sensitive issue and very difficult in the ABC setting, where the likelihood is intractable. We discuss an automatic proposal distribution useful for ABC-MCMC algorithms. This proposal is inspired by the theory of quasi-likelihood (QL) functions and is obtained by modelling the distribution of the summary statistics as a function of the parameters. Essentially, given a real-valued vector of summary statistics, we reparametrize the model by means of a regression function of the statistics on parameters, obtained by sampling from the original model in a pilot-run simulation study. The QL theory is well established for a scalar parameter, and it is shown that when the conditional variance of the summary statistic is assumed constant, the QL has a closed-form normal density. This idea of constructing proposal distributions is extended to non constant variance and to real-valued parameter vectors. The method is illustrated by several examples and by an application to a real problem in population genetics.Comment: Published at http://dx.doi.org/10.1214/14-BA921 in the Bayesian Analysis (http://projecteuclid.org/euclid.ba) by the International Society of Bayesian Analysis (http://bayesian.org/

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Cagliari

Universidad Carlos III de Madrid e-Archivo

Archivio istituzionale della ricerca - Università di Padova

Deep Learning and Bayesian Calibration Approach to Hourly Passenger Occupancy Prediction in Beijing Metro: A Study Exploiting Cellular Data and Metro Conditions

Author: Cabras Stefano
Sun He
Publication venue
Publication date: 07/11/2023
Field of study

In In burgeoning urban landscapes, the proliferation of the populace necessitates swift and accurate urban transit solutions to cater to the citizens' commuting requirements. A pivotal aspect of fostering optimized traffic management and ensuring resilient responses to unanticipated passenger surges is precisely forecasting hourly occupancy levels within urban subway systems. This study embarks on delineating a two-tiered model designed to address this imperative adeptly: 1. Preliminary Phase - Employing a Feed Forward Neural Network (FFNN): In the initial phase, a Feed Forward Neural Network (FFNN) is employed to gauge the occupancy levels across various subway stations. The FFNN, a class of artificial neural networks, is well-suited for this task because it can learn from the data and make predictions or decisions without being explicitly programmed to perform the task. Through a series of interconnected nodes, known as neurons, arranged in layers, the FFNN processes the input data, adjusts its weights based on the error of its predictions, and optimizes the network for accurate forecasting. For the random process of occupation levels in time and space, this phase encapsulates the so-called process filtration, wherein the underlying patterns and dynamics of subway occupancy are captured and represented in a structured format, ready for subsequent analysis. The estimates garnered from this phase are pivotal and form the foundation for the subsequent modelling stage. 2. Subsequent Phase - Implementing a Bayesian Proportional-Odds Model with Hourly Random Effects: With the estimates from the FFNN at disposal, the study transitions to the subsequent phase wherein a Bayesian Proportional-Odds Model is utilized. This model is particularly adept for scenarios where the response variable is ordinal, as in the case of occupancy levels (Low, Medium, High). The Bayesian framework, underpinned by the principles of probability, facilitates the incorporation of prior probabilities on model parameters and updates this knowledge with observed data to make informed predictions. The unique feature of this model is the incorporation of a random effect for hours, which acknowledges the inherent variability across different hours of the day. This is paramount in urban transit systems where passenger influx varies significantly with the hour. The synergy of these two models facilitates calibrated estimations of occupancy levels, both conditionally (relative to the sample) and unconditionally (on a detached test set). This dual-phase methodology furnishes analysts with a robust and reliable insight into the quality of predictions propounded by this model. This, in turn, avails a data-driven foundation for making informed decisions in real-time traffic management, emergency response planning, and overall operational optimization of urban subway systems. The model expounded in this study is presently under scrutiny for potential deployment by the Beijing Metro Group Ltd. This initiative reflects a practical stride towards embracing sophisticated analytical models to ameliorate urban transit management, thereby contributing to the broader objective of fostering sustainable and efficient urban living environments amidst the surging urban populace

Universidad Carlos III de Madrid e-Archivo

A Bayesian Spatio-temporal model for predicting passengers' occupancy at Beijing Metro

Author: Cabras Stefano
Sunhe Flor
Publication venue
Publication date: 16/12/2021
Field of study

This work focuses on predicting metro passenger flow at Beijing Metro stations and assessing uncertainty using a Bayesian Spatio-temporal model. Forecasting is essential for Metro operation management, such as automatically adjusting train operation diagrams or crowd regulation planning measures. Different from another approach, the proposed model can provide prediction uncertainty conditionally on available data, a critical feature that makes this algorithm different from usual machine learning prediction algorithms. The Bayesian Spatio-temporal model for areal Poisson counts includes random effects for stations and days. The fitted model on a test set provides a prediction accuracy that meets the standards of the Beijing Metro enterprise

Universidad Carlos III de Madrid e-Archivo

Modulation of genetic associations with serum urate levels by body-mass-index in humans

Author: Cabras Stefano
Huffman Jennifer E.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 26/03/2015
Field of study

Documento escrito por un elevado número de autores/as, sólo se referencia el/la que aparece en primer lugar y los/as autores/as pertenecientes a la UC3M.We tested for interactions between body mass index (BMI) and common genetic variants affecting serum urate levels, genome-wide, in up to 42569 participants. Both stratified genome-wide association (GWAS) analyses, in lean, overweight and obese individuals, and regression-type analyses in a non BMI-stratified overall sample were performed. The former did not uncover any novel locus with a major main effect, but supported modulation of effects for some known and potentially new urate loci. The latter highlighted a SNP at RBFOX3 reaching genome-wide significant level (effect size 0.014, 95% CI 0.008-0.02, Pinter= 2.6 x 10-8). Two top loci in interaction term analyses, RBFOX3 and ERO1LB-EDARADD, also displayed suggestive differences in main effect size between the lean and obese strata. All top ranking loci for urate effect differences between BMI categories were novel and most had small magnitude but opposite direction effects between strata. They include the locus RBMS1-TANK (men, Pdifflean-overweight= 4.7 x 10-8), a region that has been associated with several obesity related traits, and TSPYL5 (men, Pdifflean-overweight= 9.1 x 10-8), regulating adipocytes-produced estradiol. The top-ranking known urate loci was ABCG2, the strongest known gout risk locus, with an effect halved in obese compared to lean men (Pdifflean-obese= 2 x 10-4). Finally, pathway analysis suggested a role for N-glycan biosynthesis as a prominent urate-associated pathway in the lean stratum. These results illustrate a potentially powerful way to monitor changes occurring in obesogenic environment

Universidad Carlos III de Madrid e-Archivo

Default prior distributions from quasi- and quasi-profile likelihoods.

Author: Cabras Stefano
Racugno Walter
Ventura Laura
Publication venue: Dipartimento di Scienze Statistiche
Publication date: 01/01/2009
Field of study

In some problems of practical interest, a standard Bayesian analysis can be difficult to perform. This is true, for example, when the class of sampling parametric models is unknown or if robustness with respect to data or to model misspecifications is required. These situations can be usefully handled by using a posterior distribution for the parameter of interest which is based on a pseudo-likelihood function derived from estimating equations, i.e. on a quasi-likelihood, and on a suitable prior distribution. The aim of this paper is to propose and discuss the construction of a default prior distribution for a scalar parameter of interest to be used together with a quasi-likelihood function. We show that the proposed default prior can be interpreted as a Jeffreys-type prior, since it is proportional to the square-root of the expected information derived from the quasi-likelihood. The frequentist coverage of the credible regions, based on the proposed procedure, is studied through Monte Carlo simulations in the context of robustness theory and of generalized linear models with overdispersion

Archivio istituzionale della ricerca - Università di Padova